2 research outputs found

    Adaptation of K-means-type algorithms to the Grassmann manifold, An

    Get PDF
    2019 Spring.Includes bibliographical references.The Grassmann manifold provides a robust framework for analysis of high-dimensional data through the use of subspaces. Treating data as subspaces allows for separability between data classes that is not otherwise achieved in Euclidean space, particularly with the use of the smallest principal angle pseudometric. Clustering algorithms focus on identifying similarities within data and highlighting the underlying structure. To exploit the properties of the Grassmannian for unsupervised data analysis, two variations of the popular K-means algorithm are adapted to perform clustering directly on the manifold. We provide the theoretical foundations needed for computations on the Grassmann manifold and detailed derivations of the key equations. Both algorithms are then thoroughly tested on toy data and two benchmark data sets from machine learning: the MNIST handwritten digit database and the AVIRIS Indian Pines hyperspectral data. Performance of algorithms is tested on manifolds of varying dimension. Unsupervised classification results on the benchmark data are compared to those currently found in the literature

    k-simplex volume optimizing projection algorithms for high-dimensional data sets

    Get PDF
    2021 Spring.Includes bibliographical references.Many applications produce data sets that contain hundreds or thousands of features, and consequently sit in very high dimensional space. It is desirable for purposes of analysis to reduce the dimension in a way that preserves certain important properties. Previous work has established conditions necessary for projecting data into lower dimensions while preserving pairwise distances up to some tolerance threshold, and algorithms have been developed to do so optimally. However, although similar criteria for projecting data into lower dimensions while preserving k-simplex volumes has been established, there are currently no algorithms seeking to optimally preserve such embedded volumes. In this work, two new algorithms are developed and tested: one which seeks to optimize the smallest projected k-simplex volume, and another which optimizes the average projected k-simplex volume
    corecore